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Abstract 

The concept of a configuration graph associated to a primitive, aperi- 
odic substitution is introduced in jlj as a convenient graphical represen- 
tation of the infinite indeterminism of the shift space of the substitution. 
The main result of ^ is an algorithm to calculate this graph from the 
substitution, in this paper we turn the tables and produce substitutions 
from graphs. We do this using the Zorro algorithm, an entirely construc- 
tive and easily applicable algorithm. In the process we show that any 
configuration graph can be obtained. 

The first section contains standard definitions and the definition of 
configuration graphs. The second and third sections develop theory used 
in the proof of the algorithm as stated in section four. The algorithm 
is easily applied without knowledge of the underlying theory. Note that 
section three is nothing but a copy of results from |T] slightly modified to 
suit the present needs. 



1 Preliminaries 

1.1 Meeting Notational Needs 

Let A be any nonempty finite set of symbols, we call A our alphabet and its 
members letters. By A* we understand the set of finite words constructed from 
the letters of A including the empty word e. Equipped with the associative 
composition of concatenation. A* is the free monoid over A. We furthermore 
let A'^ = ^*\{e} denote the set of nonempty words, and for any u £ A* we let 
|u| be the length of u, i.e., the number of letters of u. Given two words u and 
V of A* we say that m is a factor of v denoted u -\ v if there exists wi,W2 S A* 
with W1UW2 = V. 

We call members of A^ (two sided) sequences over the alphabet A. Let x be 
some sequence and let i S Z, we denote the letter at index i with , given an 
additional j G Z with i < j we let X[ijj denote the word consisting of the letters 
from index i to index j, both included. We define the language of some sequence 
X to be the set C{x) — {e} U {u G A* \ 3i,j G Z,i < j : u = X[i_j] } and call its 
members factors of x. We define the shift a : A^ ^ A^ hy {a{x))^^ — 
for X e A^ a sequence and i ranging over Z. Elements of A^^ are called one 
sided sequences over A; subscript notation and definition of language, factors 
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and shift apply to these as well, only the indices range over N and not Z. Note, 
however, that while the shift is bijective on it is only surjective on A^. 

Let u be any word of A* and x a one sided sequence, the concatenation ux is 
defined the obvious way. Given a two sided sequence x and i € Z we let a;](x>,i] 
and .r[,;,oc[ dc;notc obvious one sided sequences. Given, on the other hand, any 
two single sided sequences x and y, we define the two sided sequence x.y by 
letting x.y^q = x[—i] for « < and x.y^q = for i > 0, i.e., by reversing x 

and concatenating it with y, letting the first letter of y have index 0. We shall 
extend this notation in the obvious way to allow for finite words between the 
dot and the one sided sequences. For the sake of an example, let x and y be one 
sided sequences and let a be some letter, we then have that a{x.ay) = xa.y. 

By a substitution t wc understand a map t : A ^ A"^ , it can bo extended 
in the obvious way to a map respecting concatenation t : A* ^ A*, further- 
more to map single sided sequences to single sided sequences and by specifying 
T{x.y) = T{x).T{y) for any x,y £ A^ to map sequences to sequences; wc shall 
not distinguish between a substitution and its extension. Note that for any 
u £ A* we have |t(u)| > |u| and that for any two substitutions ti and T2 the 
composition tiT2 defines a substitution as well. 

1.2 Primitivity and Aperiodicity: Pretty Interesting Sub- 
stitutions 

In this subsection we introduce the concept of primitivity, the language asso- 
ciated with a substitution, the shift space associated with a substitution and 
finally the concept of aperiodicity. The different properties are easily verified if 
one proceeds in the order they are listed here. 

Definition 1 A substitution r is said to be primitive if it holds that 

3nGNVa,6G^:6HT"(a) 

and that 

3a e AVN G N3n G N : |r"(a)| > N. 

Notice that the first of these properties implies the second if we have \A\ > 1, 
indeed the second property does nothing but exclude the substitution a i— > a in 
a theoretically convenient way. 

Proposition 2 Let r be any primitive substitution. We have the following 
properties: 

(i) 3n G NVa, 6 G ^Vi G No : 6 H r"+^(a) 

(ii) Va G A^N G N3n G N : |r"(a)| > N 
(Hi) 3x G A^3n G N : r"(a;) = x 

Now let T be some substitution, we define the langauge of r by 
C{t) = {u G ^* I 3a G A3n G N : u H T"(a)} . 
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Proposition 3 Let r he any substitution. We have the following properties: 

(i) riCir)) C £(r) 

(a) Vu,v G A* ■.u-\v,v e C{t) w e £(t) 
// furthermore t is primitive we get that: 

(Hi) A C £(r) 

(iv) Vn e N : £(t) = £(t") 

Consider now the non primitive substitution: 

: 1 I— > 2, 2 1— > 3, 3 I— > 3. 

We obviously have >C(r<j) = {e, 2, 3} and -C(t^) = {e, 3} which demonstrates that 
primitivity is a necessary condition for the two fower properties. 

We furthermore define the shift space associated with r by 
Xr = {xeA^\ £{x) C £(t)} . 

Proposition 4 Let r be any substitution. We have the following properties: 

ft) a{Xr) = Xr 
(ii) T{Xr) C Xr 

If furthermore r is primitive we get that: 

(Hi) yx G Xr^u G C{x)3n G NoVi G Z : u H a;[j_j+„] 
(^ii;; Vx G : = C{t) 

(v) yn€N:Xr = Xr^ 

(Vl) Xr^0 

A sequence x G A^ is said to be periodic if there exists an n G N such that for 
all i G Z we have xjjj =[i+„], n is called the length of the period. Finally let 
r be a primitive substitution. We say that r is periodic if Xr is finite. This is 
equivalent to t having a periodic member of Xr which is again equivalent to 
having all members of Xr periodic. Aperiodicity is obviously defined as the lack 
of periodicity for sequences as well as substitutions. 

We end this somewhat tedious subsection with a small but handy lemma: 

Lemma 5 Let t be a primitive, aperiodic substitution and let x G Xr ■ We have 
that 

Vi,j G Z : = xy^^^ ^i=j 

The proof is an easy application of the definitions above, a symmetrical version 
of the lemma also holds. 
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1.3 Orbit classes, specials and configuration graphs 

Let T be a primitive, aperiodic substitution. By definition this implies that 
is infinite. In this subsection we shall consider the structure of X^-, in particular 
we shall present the concept of a configuration graph associated to r which is a 
convenient graphical representation of the infinite indeterminism of X-r- 

Definition 6 Let t be any substitution. Letx,y G Xr- We define the following 
relations: 

(i) X ^ 3m e ZVi e Z : y[i+m] 
(a) X y <^^m e I3M e ZVi > M -. 
(Hi) X y <^ 3m G ZBAf e ZVi < M : X[i] 

We name these relations orbit equivalence, right tail equivalence respectively 
left tail equivalence and immediately verify that they are indeed equivalence 
relations. The equivalence classes under orbit equivalence are called orbit classes 
and since both right and left tail equivalence respect orbit equivalence they 
define equivalence relations on the orbit classes as well. 

Definition 7 Let t be any substitution. A sequence x € X^- is called left special 

if there exists y G X^ with 

7^ ?/[-!] 2;[o,oo[ = V[0,oo[- 

An orbit class C G Xr/ ~o is called left special if there exists an orbit class 
D G Xr/ with C D and C D. 

And yes, an easy application of lemma shows that if r is primitive and ape- 
riodic then an orbit class is left special if and only if it contains a left special 
sequence. Right special sequences and orbit classes are defined symmetrically. 

As mentioned in theorem 1.5 of 1 the number of left as well as right special 
orbit classes is finite but nonzero if r is primitive and aperiodic. This makes 
the following definition meaningful: 

Definition 8 Let t be a primitive, aperiodic substitution. The configuration 
graph is a bipartite graph defined as follows: The set of left vertices are the 
equivalence classes of orbit classes under left tail equivalence that contain a 
special orbit class. The set of right vertices are defined symmetrically and each 
special orbit class gives rise to an edge connecting the left and right equivalence 
classes that contain it. 

As an example, the primitive, aperiodic substitution 1 ^ 121, 2 ^ 2112 has 
the following configuration graph: 



• • 




• • 



The calculation of configuration graphs is by no means a trivial exercise, in- 
deed an algorithm doing this is the main result of 1 . This algorithm is most 
conveniently implemented online, see (2j for details. 



— y[i+m] 

— VU+m] 
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2 Generators 



Definition 9 Let r be any substitution. Let {v,u,w) S x x We 
say that {v, u, w) is a generator for t if u G jC{t) and furthermore t{u) = vuw. 
We denote by Gr the set of all generators for r. 

Given a generator (w,w,w) we shall refer to w, u and w as the left wing, the 
center respectively the right wing to facilitate the language. Furthermore we 
shall refer to the length of the center as the length of the generator. 

Definition 10 Let t be any substitution and let (w,u, w) € Gr- We define the 
completion of {v, u, w) by 

{v, u, w)* = • • • t"^ {v)t{v)vu.wt{w)t'^ {w) ■ ■ ■ 

and note that this is a member of . 

This definition is our main justification for working with generators: they pro- 
vide a means of creating members Xj- and they do so in a nice way as we shall 
see below. But before we start completing let us first impose some structure on 
the set of generators. 

Definition 11 Let r be any substitution and let {v,u,aw) be a generator with 
v,u G , a G A and w G A* . Then obviously {v,ua,wT{a)) is a generator 
as well and we say it it constructed from the original by right extension; left 
extension is defined similarly. We say that two generators gi and g2 for r are G 
related (denoted by gi 92) if there exists a generator 93 such that gs can be 
constructed from gi by a series of (possibly zero ) right and left extensions and 
gs can be constructed similarly from 52 • 

One quickly realizes that left as well as right extensions arc deterministic, i.e., 
any generator can be left or right extended in exactly one way. Furthermore, 
right and left extensions arc independent since they take place on different sides 
of the center, so to speak, and this implies that their order can be exchanged 
in a a series of mixed extensions. Summing up, the relation defined above is 
transitive as well as obviously reflexive and symmetric, i.e., it is an equivalence 
relation. 

Definition 12 Let t be any substitution. We define the basic generators to be 
all generators that are not G related to any shorter generator. 

We shall see shortly that there is exactly one basic generator in each equivalence 
class. But let us pause to consider how we would calculate the basic genera- 
tors of a substitution, this turns out to be very easy in the case of primitive 
substitutions: 

Lemma 13 Let r be any substitution and let g — (w, aub, w) be any generator 
of two or more letters with v,w G , u G A* and a,b G A. It is basic if and 
only «/|r(a)| > \v\ and |t(6)| > \w\. 
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Proof: Suppose one of the length inequahties fail, say, \T{a)\ < \v\. Then we 
can write v = T{a)v' for some v' G A* and {v'a,ub,w) is a generator shorter 
than g and obviously G related to g. 

Now suppose both length inequalities hold. Lot n e No- Wc shall show by 
complete induction on n that if g' and g" are two more generators and g can be 
extended to g" in a series of n extensions and g' can be extended to g" in another 
series of extensions, then g' is longer than or has the same length as g. Let m 
be the number of left extensions of the n steps and let m' be the number of left 
extensions in the steps extending g' to g" . If both are nonzero we can remove one 
left extension from both series and still end up with a common result, since left 
and right extensions commute, and afterwards apply the inductive hypothesis. 
We cannot have m = and m' > since the first would let the left length 
inequality hold for g" and the second would contradict this. This leaves us 
with m>m' and since the same arguments applies to right extensions we have 
finished our inductive argument and the proof. □ 

Corollary 14 Let t be a primitive substitution. The following holds: 

(i) All one letter generators are basic. 

(a) Let {v,ab,w) be any two letter generator with v,w € and a,b G A. It 
is basic if and only if T{a) = va and T{b) = bw. 

( Hi) No generators of three or more letters are basic. 

Notice that the primitivity condition is necessary for part (iii) since a non primi- 
tive substitution may have basic generators of any length. Consider for instance 
the following non primitive substitution: 

^ 01230, 1 1, 2 2, 3 30123. 

This has the generator (0123,0123,0123) which is basic by the lemma thus 
contradicting the corollary. 

The scit of basic generators of a primitive substitution is very easily calculated 
using the corollary: The one letter generators can be read off the definition of 
the substitution directly; the two letter generators in question are all those that 
can be constructed from mating a one letter empty right wing " generator" with 
a one letter empty left wing " generator" , bearing in mind that the center must 
always be in C{t). As an example consider the following primitive substitution: 

^ 042, 1 1-^ 142, 2 1-^ 042, 3 ^ 043, 4 ^ 01432. 

This has the four basic generators (01, 4, 32), (04, 20, 42), (04, 21, 42) and (04, 30, 42) 
and no more, in particular (04,31,42) is not even a generator. 

The following proposition justifies the basic generators as being, in essence, all 
generators: 

Proposition 15 Let r be any substitution. We then have: 
(i) No two different basic generators are G related. 
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(m) Any generator is G related to a unique basic generator. 

Proof: The proof of (i) proceeds similarly to the proof of the second part of the 
lemma, i.e., complete induction on the number of steps required to extend g to 
some generator that another basic generator can be extended to as well. Com- 
mon left extensions are handled by the inductive hypothesis and left extensions 
in only one of the extension series are contradicted by the lemma. The proof of 
(ii) is immediate by induction on the length of the generator by the definition 
of basic generators; the uniqueness is a spinoff from part (i). □ 

It is now time to consider how these structures on Gr interact with the comple- 
tion of members of Gr- The following result is a pretty one: 

Proposition 16 Letr be a primitive, aperiodic substitution and let gi,g2 G Gr- 
We have that 

9l 92 91 ~G 32- 

Proof: The arrow leading left is immediate since left and right extension preserve 
completion up to orbit equivalence. 

Assume now that gl 92- Assume initially that g^ = g2- Let ni,n2 £ N be 
the length of the right wing of 171 respectively 32- Since 

aperiodicity ensures that ni — n2. This immediately implies that if 171 and 32 
are of equal length then they are equal, and if they are not, then the shorter can 
be left extended to obtain longer. If gl ^ ^3 then there must exist ap G Z,p ^ 
such that a^{gi) —92- In case p > then by performing p right extensions of 
gi we are in the situation above. The case p < is handled by right extending 
92. □ 
With the construction of specials in mind, the following result is promising: 

Proposition 17 Letr be a primitive, aperiodic substitution and letgi,g2 G Gr- 
We have that gl ~r 172 holds if and only if there exist two generators g'l 9i 
and g'2 92 with identical right wings. 

Proof: Assume that gl 92 holds. If we have the luck that gi 92 then by 
definition there exists a g' with gi ^g 9' and 52 ^ 9' and letting g[ = g' and 
g'2 = g' concludes the case. If, on the other hand, gi ooq g2 holds then we have 
the existence of p,j G Z such that 

> j : CTP{gl)[ii = 

and 

92U-1]- 

Assume initially that p = 0. If we further assume assume that j < 0, then we 
can halfway duplicate the calculations from the proof of proposition 1161 Let 
ni,n2 G N be the length of the right wing of gi respectively 52- We now get: 

C^""'('r(3l))[n2,oo[ 



— 3l[n2,oo[ 

* 

y2 [n2 ,c>o[ 
= ^""nr(52))[n2,oo[ 
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This by lemma El is enough to ensure that ni ~ 712 which proves that the two 
generators have identical right wings. Now if > then we perform j right 
extensions on both generators and proceed as above, this concludes the case 
p = 0. And as above, if p > then we do p right extensions of 51, if p < then 
we do p right extensions of 32 and in both cases proceed as in the case p = 0. 
The reverse is immediate. □ 

Given two basic generators g\ and 32 with g\ o^q §2 and suppose we'd like to 
know whether gl .92- The proposition above tells us to look for G related 
generators with identical right wings, but this is not an algorithmically very 
pleasant task. But the proof above shows that g[ and g!^ - if they exist at all - 
can be constructed by doing nothing but right extensions of gi respectively 52 ■ 
After possibly undoing some pairwise identical right extensions we can further- 
more obtain generators with identical right wings that disagree on either their 
rightmost letter of the center or the letter just before that. If now additionally r 
is regular, then this puts a maximum limit to the length of the desired common 
right wing, thereby making the test for gl 32 ^ finite story. Let us list an 
even simpler and most useful case: 

Corollary 18 Let t be a primitive, aperiodic, postfix free substitution and let 
91,92 G Gt with gi q^q 92- We have that gl 52 holds if and only if the right 
wings of gi and (72 are identical. 

A final note to conclude this section: The definition of the completion of a 
generator is not entirely symmetrical with respect to the left and right wings 
of the generator. The given definition has the pleasant property that right 
extending the generator shifts the completion one step; we rely heavily on this 
in the proofs above. On the other hand, one might fear that this would introduce 
some asymmetry to completions. This, however, is not the case as long as we 
stick to orbit classes. Indeed, the symmetrical versions of both proposition 1171 
and coroUarv 1181 above hold, this is most easily checked by shifting to opposite 
substitutions. 



3 Generating specials 

Definition 19 Let r be any substitution. The leftmost letter graph (the II graph) 
is defined to be the graph with the letters of A as vertices and with one directed 
edge leaving each vertex a £ A arriving at the leftmost letter of T{a). The 
rightmost letter graph (the rl graph) is defined similarly. 

Definition 20 Let r be any substitution and let n £ N. We say that n is a 
left segregating number if for any two words u,v £ Cnir) with differing leftmost 
letter we have that the length of the common prefix of t(u) and t{v) is less 
than or equal to min {\t{u)\, \t{v)\} — n. Right segregating numbers are defined 
similarly. 

Note that not all substitutions have a segregating numbers. Consider for in- 
stance the following primitive, aperiodic substitution: 

Te : a !—>■ c, b c, c t— > db, d i— > ca. 
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Squaring this we get a substitution with the two generators (rf, bca, cdb) and 
(ca, cdb, cdb). This imphes that for any n g N there exists u G A* with \u\ = n— 1 
and au,bu G Cn{Te) which shows that n cannot be a left segregating number 
since we have that Te{au) — Te{bu). On the other hand, note that for any prefix 
free substitution 1 will do as left segregating number, similarly any postfix free 
substitution has 1 as right segregating number. We say that a substitution is 
segregating if it has both a left and a right segregating number. As is often the 
case, regular substitutions behave nicely: 

Proposition 21 Let r be any primitive, regular substitution. Then t is segre- 
gating. 

Proof: We prove only the existence of the left segregating number, the right case 
is symmetrical. Since r is primitive there must exist an a G ^ with r(a) > 1. 
By minimality there exists an s G N such that any u G Cs{t) contains a. Now 
let 

It now follows from theorem 1.6 in 3 that s{P— \A\+Q — 1) is a left segregating 
number. □ 

Definition 22 Let r be any substitution with a left segregating number. Let n G 
N be the least such. We define the left segregating graph (the Is graph) as follows: 
The vertices are all pairs of words from Cn{T) which differ at their leftmost 
letter. One directed edge leaves each vertex, if the vertex is (u, v) then the 
destination is obtained by removing the common prefix from t{u) and t{v) and 
reading the leftmost n letters from each remaining word. The right segregating 
graph (the rs graph) is defined similarly for a substitution with a right segregating 
number. 

It is time for an example, consider the following primitive, aperiodic, regular 
substitution: 

T4 : ^ 10, 1 i-> 0. 
The 11 and rl graphs are as follows: 

11: OCZ^l C^^-^ 1- 

As left segregating number 1 will do, and clearly it is the least such. On the 
other hand, 2 is the least right segregating number. Since -Ci(r) = {0, 1} and 
'^2{t) — {00,01, 10} we get the following Is and rs graphs: 

Is: (0,1)' ^(1,0) rs: (00,01) (10,01) 



(01,00) (01,10). 

We say that any of the graphs defined above are subfixed if for each vertex 
V, V either loops to itself (i.e., the edge leaving v goes back to v) or the edge 
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leaving v goes to some other vertex that loops to itself. Of the graphs in the 
example above only the rl graph is subfixed. It is, however, the case that for any 
segregating substitution t there exists an n e N such that all the graphs 11, rl, 
Is and rs for r" are subfixed. To realize this, notice first that if t is segregating 
then so is any nonzero power of r. Then note that raising the power of r by 
one corresponds to extending each edge by its immediate successor in any of the 
graphs above. Finally let m be the least common multiple of the length of all 
cycles in all the graphs (each must have at least one cycle if r is primitive and 
aperiodic). Then raising r to the power of any positive multiple of m ensures 
that all vertices that are in cycles the original graph now loop to themselves and 
by choosing a sufficiently high multiple we can make all other vertices connect 
to one of these vertices. In the simple example above choosing n = 2 will work, 
i.e., for r| all the graphs 11, rl, Is and rs are subfixed. The following theorem is 
our main justification for this as well as the preceding section: 

Theorem 23 Let r be any primitive, aperiodic, segregating substitution with all 
the graphs II, rl, Is and rs subfixed. Then for any left or right special sequence 
u G Xt there exists a generator g £ Gr such that g* u. 

To prove this, consider first the following lemma: 

Lemma 24 Let t be any primitive, aperiodic substitution with a right segre- 
gating number and with the rs graph subfixed. Suppose we have u,v G 
with U[o,oo[ — ^^[o,oo[ OLi^d ^[-1] 7^ ^[-1]- Then there exist u',v' G X^ with 
"[o.oo[ = «[o,oo[ a^'^ "[-1] ^ ^'[-1] ""'^ 

u[—n, —1] = u'[—n, —1], v[—n, —1] ~ v'[—n, —1] 

and 

u = CF~P (r(u')) , v^a^P {t{v')) , 

where n € N is the least right segregating number and p £ Nq is the length of 
the common postfix of T{u'[—n, —1]) and T{v'[~n, —1]). 

Proof of lemma: By corollary 12 of 0] there exists x,y € Xt with u ~o Tix) 
and V Tiy)- By lemma 3.1 of we get that x ~r U- But since u oo^ v we 
also have x o^o y and we may choose u' x and v' y with u|p — Wjp 
and ^'[-i]' there exists p,q £ such that u — cr~P(T{u')) and 

V — a^'^{T(v')) but it follows from lemma that p = q and we can furthermore 
deduce that these must equal the length of the common postfix of t(u'[— rt, —1]) 
and r(w'[— n, —1]). Now repeat this exercise to produce u" and v" with ujp — 

^[0 oo[ ^^'^ "[-1] ^ "[-1] ^^'^ "^^ti^ u' = o'~''(t(u")) v' = (7~''(t(w")) where r 
is the length of the common postfix of t(u"[— n, —1]) and r(w"[— n, —1]). Now 
going from (u", v") to (u', v') and on to (u, v) makes the pair of words at index 
[— n, —1] change according to the rs graph and since this is subfixed we have 
that u[—n, —1] = u'[—n, —1] and v[—n, —1] = v'[—n, —1] as desired. □ 

Proof of theorem: We assume that u is left special, the right case is, as is often 
the case, symmetrical. By definition there must exist v £ Xr with itjQ = '^[o,oo[ 
and U[-i] "^[-1]- Now let n G N be the least right segregating number, let 
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p S No be the length of the common postfix of t(u[— rt, —1]) and t(w[— n, — 1]) 
and let r G Nq be \T{u[—n, — 1])| —p — n. Now suppose both p and r are nonzero. 
Then chose 

.9 = (w[-n-r,-n-l],'"[-n,-l],U[0,j9-l])- 

If on the other hand r is zero and p nonzero we choose 

g = ("[-n-s-l,-n-2],W[-n-l,-l],'"[0,p-l])j 

where s = |r(w[_„_]^])| — 1 which is nonzero. If finally p is zero and r nonzero 
we choose 

9 ('"[-n-r,-ri-l],W[-ri,0]:"[l,s]), 

where s = |t(u[o])| — 1 which is nonzero as well. Note that due to primitivity, 
we cannot have both p and r zero. The theorem now follows in each case from 
iterating lemma making use of the fact that the rl graph is subfixed in the 
second case and that the 11 graph is subfixed in the third case. □ 

Let us shortly consider the usefulness of this result: Given a substitution it is 
often easy to find some special sequences using generators, e.g., any two gener- 
ators with identical right wings but disagreeing letters in the center complete to 
left special sequences modulo orbit equivalence. On the other hand, this result 
tells us that under certain circumstances all special sequences can be obtained in 
this way. And since the results from the previous section gives us some measure 
of control over the generators, we are now in a better position to face the spe- 
cial sequences of a substitution. One possible application could be to calculate 
special sequences and thereby configuration graphs for arbitrary substitutions, 
but this is already done very well in indeed the present section steals heavily 
from this source. Instead we shall use our results to produce certain substitu- 
tions with desirable properties such as having a particular configuration graph; 
this is the object of the next section. 



4 The Zorro Algorithm 

4.1 Miscellaneous Tools 

This subsection contains miscellaneous minor results that are needed in the 
proof the Zorro Algorithm. While the results are (probably) true, they may 
appear unmotivated and rather out of context. Do not worry though, all will 
be clear in due time. 

Lemma 25 Let r he any primitive substitution. If t has either a left or a right 
special sequence then it is aperiodic. 

Proof: Suppose it has a left special sequence, this provides us with sequences 
X, 1/ G Xr with y[-i] and a;[o.oo[ = 2/[o.oo[- Assume now that r is periodic, 

this implies that x and y are each periodic, let n, m be the lengths of their 
periods. But then both sequences are periodic with periods of length nm as 
well which is an obvious contradiction. □ 
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Proposition 26 Let r be any substitution with subfixed II and rl graphs. We 
have that 

£2(t) = {u & A2 \ & a : u -\ r(a)} \J {u & A'l \ & A : u -\ T^(a)} . 

Proof: Any member of the right hand side is a member of the left hand side 
by definition. Now let u G /^2{t), by definition we have a €: A and n e N 

with ?i H T"(a) and wc may chose a and n snch that n is minimal. Assiime 
for the sake of contradiction that n > 3. This implies that there can be no 
letter b H r"-i(a) with u H t(6), nor any letter 6 H r" ^(o) with u H r^(6). 
But this again implies that there exist v,w G A'^ with vw = T"~'^(a) and with 
u = rl(T^(i;))ll(r^(u')). But since the 11 and rl graphs are subfixed we have that 

vI{t^{v))1\{t^{w)) = rl(r(u))ll(r(w;)), 

which implies the contradiction u H r"~^(a). □ 

This result can be generalized to word lengths higher than 2. We are, however, 
more interested in the following corollary: 

Corollary 27 Let r be any substitution with subfixed II and rl graphs. Let 
W = {u G A2 \ 3a G A: u -\ T{a)} . 

We have that 

C2{t) =WU {rl(T(a))ll(T(6)) \abeW}. 
4.2 The Theorem and the Algorithm 

Definition 28 A bipartite graph is said to be undecided if it has the following 
properties: 

(i) There are no lonely vertices, i.e., any vertex has one or more outgoing 
edges. 

(ii) There are no lonely edges, i.e., for any edge there exists another edge with 
one or both vertices mutual. 

(Hi) There exists a left vertex with two outgoing edges. 

(iv) There exists a right vertex with two outgoing edges. 

We say that a primitive aperiodic substitution realizes its configuration graph 
and in general that a bipartite graph is realizable if there exists a primitive, 
aperiodic substitution realizing it. The following theorem is the conclusion to 
much of our work: 

Theorem 29 A bipartite graph is realizable if and only if it is undecided. In- 
deed, for any bipartite undecided graph the Zorro algorithm described below will 
compute a primitive, aperiodic substitution realizing it. 
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Proof: Note initially that by the definitions and results of subsection 11.31 it 
is immediate that any realizable graph is undecided. To prove the other way 
round, we shall first state the Zorro algorithm with a few examples and then 
afterwards consider that it actually produces the desired substitutions. 

Consider the following three bipartite graphs: 



Z : 


1 • — 


— .2 


W : 


1 • — 


— .2 




3 • — 


— -4 




5 • 


• 3 


E : 


1 • 


• 2 




6 • — 


— .4 



Now let G be any bipartite undecided graph. It follows from parts (iii) and 
(iv) of the definition that G must contain one or more of the above graphs 
as a subgraph. The algorithm has three cases corresponding to these three 
subgraphs, each of these cases proceeds according to the following common 
recipe but with slightly differing ingredients^: 

1. The first part simply states an initial substitution that realizes the given 
subgraph. The alphabet has one letter corresponding to each vertex in 
the subgraph but also contains additional letters that do not correspond 
to vertices. The following three steps will gradually extend the initial 
substitution such that the final result realizes G. 

2. Remaining vertices are added now: For each vertex in G not in the sub- 
graph, we add a new letter to our alphabet. The value of our substitution 
at these new letters are assigned according to left and right patterns for 
left respectively right vertices. To be precise, the value of a new letter cor- 
responding to a left vertex is obtained by postfixing the word produced 
by the left pattern with the new letter itself, right letters are treated sym- 
metrically. 

3. Then the first edges: For each pair of vertices that are presently uncon- 
nected but are connected in G we add the first (possibly only) edge by 
inserting the two letter word consisting of the two letters corresponding 
to the left respectively right vertex at the insertion point specified as part 
of the initial substitution. 

4. And finally the remaining edges: For any two vertices that are already 
connected but lack the number of edges present in G, we add a new letter 
to our alphabet for each missing edge. The value of the substitution at 
such a new letter is obtained by taking first the value of the substitution 
at the letter corresponding to the left vertex minus the rightmost letter, 
then adding the new letter and finally the value of the substitution at the 
letter corresponding to the right vertex minus the leftmost letter. All new 
letters produced in this step are finally added directly as one letter words 
at the insertion point. 

^Incidentally, the algorithm is named after the particular shape of the Z graph, this was 
the first case solved. 
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As a start, let us specify the initial substitution with insertion point and left 
and right patterns in the case of the subgraph Z, which is the easiest case: 



1 

2 


22451 
245133 


left pattern : 22_— 2 45 


3 ^ 
4 


2224513 

451333 


5,6,7,... 

right pattern : 51 33 — ■ 3 


5 ^ 


222245 1 13333 


5,6,7,... 



A few words on the notation: The insertion point is specified by a vertical line, 
in this case in the middle of the value of 5. As a theoretical convenience we have 
highlighted letters in values letters that are identical to the source letter, this is 
of no importance when applying the algorithm. The patterns produce words of 
increasing length, i.e., the first word produced by the left pattern in this case is 
2222245, the next 22222245 and so on. Finally note that the letters 1 though 
4 corresponds to the vertices of Z whereas the letter 5 does not correspond to 
any vertex. 

An example is due, indeed we should very much like to realize the following 
graph: 




Luckily, it is undecided. Initially we need to identify which of the three graphs 
that are contained in this graph. As it happens, both the Z and E are sub- 
graphs. For didactic reasons we chose to carry on with Z, but choosing E 
would have produced a realizing substitution as well. But then we have an ini- 
tial substitution and step one of the algorithm is complete and leaves us with 
the following substitution and its configuration graph: 

1 ^ 22451 

2 245133 

3 ^ 2224513 

4 451333 

5 ^ 222245 | 13333 

3 • #4 

Notice a two things here: The highlighted symbols and the insertion point in the 
substitution are of course not a part of the substitution but rather theoretically 
convenient layout, just as the letters labeling the vertices. Also notice that Z 
does not occur as subgraph of our graph in an unambiguous way, indeed we could 
have chosen to let the vertices of Z coincide with all vertices except the lower 
right instead. This, like the choice between Z and E at step 1, does not matter, 
all choices will produce realizing, if not necessarily identical, substitutions. As 
for step two, we need to introduce one more vertex, this is done by adding 
the letter 6 to our alphabet and assigning it the value 65133333 in accordance 
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with the right pattern since it is a right vertex. We now have the following 
substitution and corresponding graph as conclusion to step 2: 

1 22451 

2 245133 

3 ^ 2224513 

4 451333 

5 ^ 222245 | 13333 

6 ^ 65133333 



Notice about this step that while the substitution above corresponds to the 
graph in algorithmic terms it does not realize it. This is a slight inconvenience 
that applies to step two only, essentially it is caused by adding lonely vertices 
to the original graph and thereby wrecking havoc upon its undecidability. As 
for step three, we need to add just one edge between the vertices 1 and 6. This 
is easily done by adding the two letter word 16 at the insertion point: 

1 22451 

2 245133 

3 ^ 2224513 

4 451333 

5 ^ 22224516 | 13333 

6 ^ 65133333 



Finally, we need to add two more edges between already connected vertices: 
One more between vertices 1 and 6 and the final between the vertices 3 and 4. 
The first is added by introducing the new letter 7 and assigning it the value 
2245 followed by 7 itself followed by 5133333, i.e., the unlikely long value of 
224575133333. Similarly the final edge is added by introducing the letter 8 and 
assigning it the value 222451851333. Both these two new letters are added at 
the insertion point and the fourth and final step of the algorithm is complete: 

1 1-^ 22451 

2 245133 

3 ^ 2224513 

4 1-^ 451333 

5 2222451678 | 13333 

6 ^ 65133333 

7 224575133333 

8 ^ 222451851333 

The example concluded, let us now state the initial substitution etc. for the 
remaining two cases. First the case of the subgraph W: 



left pattern : 4 22 — 2 37 

4,5,6,... 

right pattern : 76 55 — • 5 1 

4,5,6,... 






1 423761 

2 ^ 237651 

3 ^ 376551 

4 ^ 43765551 

5 ^ 4223765 

6 ^ 4222376 

7 ^ 223747 I* 17655 
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As hinted by the star next to the insertion point, there is one pecuharity to this 
case as compared to the two others: All words inserted at the insertion point, 
whether at step three or four in the algorithm, need to be followed by the letter 
7, e.g., if the algorithm tells us to insert the words 53, 8 and 9 at the insertion 
point, then we need to insert 5378797 and not just their concatenation 5389 as 
we would in the other two cases. This is caused, in a sense, by the graph W 
being disconnected, the symbol 7 works as bridge between the parts. The final 
case of the subgraph E completes the definition of the algorithm: 



1 2534251 

2 ^ 2513451 

3 2534253 

4 4513451 

5 ^ 251134 I 342251 



left pattern : 25 n_^^34 

3,4,5,... 

right pattern : 34 22^^^51 

3,4,5,... 



As conclusion to our description of the algorithm we provide two more examples, 

one for each of the graphs W and E. We shall not go into the same level of 
detail as before, but rather just present the desired graphs and then state the 
results of running the algorithm. The two undecided graphs we would like to 
realize are: 




The first contains the graph W and running the algorithm gives us the following 
result: 



1 


^ 423761 


1 


2 


^ 237651 




3 


^ 376551 




4 


^ 43765551 


9 


5 


^ 4223765 




6 


4222376 




7 


^ 223747187947 |* 17655 


5 


8 


^ 87655551 




9 


42222379 






6 • 



Notice here how the words 18 and 94 are followed by the letter 7 as specified 
above. The final example gives the following result: 
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1 . 
1 ^- 


Z0o4Z0l 


i • 




-> Z0lo40l 




3 i- 


-> 2534253 




4 H 


^ 4513451 


6 • 


5 H 


^ 25113467890 | 342251 




6 K 


^ 25111346 




7 H 


^ 73422251 




8 H 


^ 2534258513451 




9 H 


^ 251113493422251 




H 


-y 251113403422251 





Having thus stated and exemplified the algorithm it is time to prove that it 
does indeed produce a primitive, aperiodic substitution realizing a given graph. 
We shall not go through all painstaking details three times. Instead, we list the 
properties that need to be verified for all cases and for each of these properties 
describe the general strategy used to verify it. And we shall, of course, verify a 
few of these properties in full detail for some of the cases. 

The first issue to consider is that of primitivity. This is fundamental to all our 
workings and luckily it holds easily for all substitutions since they all contain 
a particular letter with the property that its value contains the entire alphabet 
and it is itself contained in the value of all letters. This is the letter with the 
insertion point. Notice that this property also holds after adding additional let- 
ters according to step 2 since these are all added at the insertion point in part 
3 by part (i) of the definition of an undecided graph. The next basic issue is 
aperiodicity, but this is easily handled by lemma l^ since left or right special se- 
quences are easily constructed from generators in all the initial substitutions. As 
an example, the generators (253425, 12, 513451) and (253425, 34, 513451) from 
case E provide us with both left and right special sequences. 

Having dealt with the basics, we now check that the produced substitutions are 
prefix as well as postfix free, this implies that they are segregating with least left 
and right segregating numbers both 1. With this in mind, we furthermore verify 
that all the four graphs 11, rl. Is and rs are subfixed. This is where the weird 
patterns used in step 2 are justified since they oversee that these properties, that 
hold for the initial substitutions, are maintained through the steps 2, 3 and 4 
of the algorithm. Take as an example the case W: The initial substitution is 
easily prefix and postfix free and some checking shows that the four graphs are 
all subfixed. Now let us add a left vertex as an example of the effects of step 2, we 
get 8 ^ 42222378. On the right hand side the new unique letter 8 protects from 
trouble. And the left pattern ensures not only that the substitution remains 
prefix free but also that the 11 and in particular the Is graph remain subfixed. 
Step 3 changes nothing and the letters introduced in step 4 also ends up being 
compatible with the state of affairs. Taking some time to verify these things 
also gives some idea of why the produced substitutions tend to be lengthy. 

We now have primitive, aperiodic, segregating substitutions with the four graphs 
11, rl. Is and rs subfixed. And indeed, we are going strong, these are exactly the 
prerequisites of theorem l23l The next consideration is to identify the set of basic 
generators for each substitution and from these verify that the desired graph 
is actually realized. Let us consider an example to simplify things: Letting 
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T be the initial substitution in the case Z we easily get by corollary that 
12,32,34 G Ciir) but 14 ^ ^2{t), which again easily gives us the following 
basic generators: 

(22224,5,13333), (2245,12,45133), (222451,32,45133), (222451,34,51333). 

This immediately implies that the orbit classes containing the completions of the 
three last generators are special and by proposition 1161 different . Furthermore, 
by theorem 1231 and corollarvll8l these are the only special orbit classes. And by 
corollary ^] the completions of the second and third are right tail equivalent 
whereas the completion of the fourth isn't right tail equivalent with any of the 
others; similarly the completions of the third and fourth are left tail equivalent 
but the second is excluded. Summing up, we have proved that the initial sub- 
stitution actually does realize the Z graph, and in general that, because of our 
careful preparations above, the configuration graph is easily read off from the 
set of basic generators. 

The general idea is now that any left vertex corresponds to a letter with a 
value consisting of a unique left part not containing the letter itself followed 
by the letter. This correspondence is set up in the initial substitution and is 
maintained through step 2 by the left pattern. The situation is symmetrical for 
the right vertices. Step 2 does thus not in itself produce any new generators, 
since the centers of the potential generators are not in the language yet. This 
setup makes the adding of vertices at step 3 very easy though, just extend the 
language by adding words at the insertion point, only we have to take some care 
in the case of case W not to introduce unwanted generators. Note that by part 
(ii) of the definition of an undecided graph we are ensured that all edges share 
a vertex with some other edge, this ensures that the generators we add in this 
step become special and thus actually figure in the graph. At step 4 we want to 
add an additional edge between already connected vertices, this is easily done 
by introducing a new generator with left and right wings corresponding to the 
vertices but with a new center, and remembering to add it to the language. 
To satisfactorily verify the algorithm one of course needs to check very carefully 
that no unwanted two letter words enter the language during the steps 2 through 
4, since this would give an undesired edge, we shall refrain from doing this in 
writing. □ 
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